
18 CHAPTER 1 Mathematical Preliminaries and Error Analysis
In our traditional mathematical world we permit numbers with an infinite number of
digits. The arithmetic we use in this world defines
√
3 as that unique positive number that
when multiplied by itself produces the integer 3. In the computational world, however, each
representable number has only a fixed and finite number of digits. This means, for example,
that only rational numbers—and not even all of these—can be represented exactly. Since
√
3 is not rational, it is given an approximate representation, one whose square will not
be precisely 3, although it will likely be sufficiently close to 3 to be acceptable in most
situations. In most cases, then, this machine arithmetic is satisfactory and passes without
notice or concern, but at times problems arise because of this discrepancy.
Error due to rounding should be
expected whenever computations
are performed using numbers that
are not powers of 2. Keeping this
error under control is extremely
important when the number of
calculations is large.
The error that is produced when a calculator or computer is used to perform real-
number calculations is called round-off error. It occurs because the arithmetic per-
formed in a machine involves numbers with only a finite number of digits, with the re-
sult that calculations are performed with only approximate representations of the actual
numbers. In a computer, only a relatively small subset of the real number system is used
for the representation of all the real numbers. This subset contains only rational numbers,
both positive and negative, and stores the fractional part, together with an exponential
part.
Binary Machine Numbers
In 1985, the IEEE (Institute for Electrical andElectronic Engineers) published a reportcalled
Binary Floating Point Arithmetic Standard 754–1985. An updated version was published
in 2008 as IEEE 754-2008. This provides standards for binary and decimal floating point
numbers, formats for data interchange, algorithms for rounding arithmetic operations, and
for the handling of exceptions. Formats are specified for single, double, and extended
precisions, and these standards are generally followed by all microcomputer manufacturers
using floating-point hardware.
A 64-bit (binary digit) representation is used for a real number. The first bit is a sign
indicator, denoted s. This is followed by an 11-bit exponent, c, called the characteristic,
and a 52-bit binary fraction, f , called the mantissa. The base for the exponent is 2.
Since 52 binary digits correspond to between 16 and 17 decimal digits, we can assume
that a number represented in this system has at least 16 decimal digits of precision. The
exponent of 11 binary digits gives a range of 0 to 2
11
−1 = 2047. However, using only posi-
tive integers for the exponent would not permit an adequate representation of numbers with
small magnitude. To ensure that numbers with small magnitude are equally representable,
1023 is subtracted from the characteristic, so the range of the exponent is actually from
−1023 to 1024.
To save storage and provide a unique representation for each floating-point number, a
normalization is imposed. Using this system gives a floating-point number of the form
(−1)
s
2
c−1023
(1 + f).
Illustration Consider the machine number
0 10000000011 1011100100010000000000000000000000000000000000000000.
The leftmost bit is s = 0, which indicates that the number is positive. The next 11 bits,
10000000011, give the characteristic and are equivalent to the decimal number
c = 1 · 2
10
+ 0 · 2
9
+···+0 · 2
2
+ 1 · 2
1
+ 1 · 2
0
= 1024 +2 +1 = 1027.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.